Quantum circuits for general multi-qubit gates 
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We consider the minimal elementary gate sequence which is needed to implement a general quan- 
tum gate acting on n qubits — a unitary transformation with 4 n degrees of freedom. For synthesizing 
the gate sequence, a method based on the so-called cosine-sine matrix decomposition is presented. 
The result is optimal in the number of elementary one-qubit gates, 4™, and scales more favourably 
than the previously reported decompositions requiring 4" — 2 n+1 CNOT gates. 
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The foundation of quantum computation Q involves the 
encoding of computational tasks into the temporal evolu- 
tion of a quantum system. Thereby a register of n qubits, 
identical two-state quantum systems, is employed. Quan- 
tum algorithms can be described by unitary transfor- 
mations and projective measurements acting on the 2™- 
dimensional state vector of the register. In this context, 
unitary transformations are also called quantum gates. 
The recently discovered quantum algorithms 00 Of em- 
body arbitrary unitary transformations and hence call 
for techniques to efficiently implement a general n-qubit 
gate. The complexity of an implementation is measured 
in terms of the number of elementary gates required ||. 
Achieving gate arrays of lower complexity is crucial not 
only because it generally results in shorter execution 
times, but it may also introduce less errors. 

Any finite-dimensional unitary transformation can be 
represented as a unitary matrix and hence any n-qubit 
gate corresponds to a certain 2™ x 2™ unitary matrix, U. 
Therefore, the powerful methods of matrix computa- 
tion can be utilized to produce quantum gate decom- 
positions. However, only decompositions yielding matri- 
ces which correspond to gate sequences of low complexity 
are interesting. We choose the library of elementary gates 
to consist of the controlled-NOT (CNOT) gate, the one- 
qubit rotations about the y and z axes, and a phase gate 
adjusting the unobservable global phase. Since the cost 
of physically realizing a CNOT gate may exceed that of 
a one-qubit gate, we count the numbers of these gates 
separately. 

A general unitary 2™ x 2™ matrix U has 4™ real degrees 
of freedom. Since each elementary one-qubit gate carries 
one degree of freedom, at least 4™ such gates are needed 
to implement U. The current theoretical lower bound for 
the number of CNOT gates needed in realizing an arbi- 
trary n-qubit gate, [4(4™ — 3n — 1)] , is given in Ref. Q. 
However, no circuit construction yielding these numbers 
of CNOT or elementary one-qubit gates has been pre- 
sented in the literature. The conventional approach |f| 



to implementing general multi-qubit gates makes use of 
the QR decomposition 6] for unitary matrices, yielding 
an array of 0(n 3 4 n ) elementary gates. Heretofore, the 
most efficient implementation based on the QR decompo- 
sition, for asymptotically large n, requires approximately 
8.7 • 4" CNOT gates Q. In addition, the synthesis of 
optimal quantum circuits for certain special classes of 
gates has been intensively studied. The implementation 
of a general two-qubit gate 0, 0, 0, El is found to re- 
quire 3 CNOTs and 16 elementary one-qubit gates. For a 
three-qubit gate, the current minimal implementation us- 
ing 40 CNOTs and 98 elementary one-qubit gates ITJ i s 
based on the Khaneja-Glaser decomposition (KGD) [13j. 
Furthermore, an implementation of an arbitrary diagonal 
unitary matrix involving 2 ra — 2 CNOTs and 2™ elemen- 
tary one-qubit gates is known [l4| . 

In this Letter, we present an efficient implementation of a 
general unitary transformation U by recursively utilizing 
the cosine-sine decomposition (CSD) [ly. In the context 
of quantum computation, the CSD has first been consid- 
ered in Ref. ^(| , and its relation to the KGD has recently 
been discussed in We decompose U into a product 
of matrices, each of which is identified with a new type 
of gate which we call a uniformly controlled rotation. To 
implement these gates, we present an efficient elementary 
gate sequence which is related to the gates recently ex- 
plored in Ref. 14] as a part of the implementation of a 
diagonal quantum computer. 

Let F^j(i? a ) denote a uniformly controlled rotation. It 
consist of fc-fold controlled rotations of qubit m about the 
three-dimensional vector a, one rotation for each of the 2 k 
different classical values of the control qubits. The index 
m may acquire the values 1, 2, . . . , n and k the values 
1,2, ... ,n — 1. An example of F^(R a ), where m = 4 
and k = 3 is shown in Fig. ^ The relative order of the 
controlled rotations is irrelevant; the gates commute. For 
instance, the uniformly controlled rotation F^ +1 (R a ) has 
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Figure 1: Definition of the uniformly controlled rotation 
Fl(Ra)- Here a is a three-dimensional vector fixing the rota- 
tion axis of the matrices = i2 a (aj). 



the matrix representation 
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where the angles ax, a%, • ■ ■ , &2 k mav be freely chosen and 
the rotation matrix i? a (0) is given by 



Ra(<P) 



in-crtt>/2 



I cos — + i 



a ■ <j) sm • 



(2) 



Above / is the unit matrix and the product a-er = a x <Tx + 
a y <jy + a z <j z involves the Pauli matrices a x , cry, and a z 
In general, F^(i? a ) is a product of 2 k two-level matrices. 



We propose an implementation of F^j(i? a ) with a x — 
using an alternating sequence of 2 k CNOTs and 2 fc one- 
qubit rotations R a (0i) acting on the qubit m. The posi- 
tion of the control node in the I th CNOT gate is set to 
match the position where the ^ th and (I + l) th bit strings 
gi-i and gi of the binary reflected Gray code differ. 
In binary Gray codes, the adjacent bit strings differ by 
definition only in a single bit, and hence the position is 
well defined. As an example, the quantum circuit for 
the gate i 7, |(i? a ) is shown in Fig. Efa) while Fig. Efb) 
illustrates the correspondence of the Gray code to the 
positions of the control nodes in the CNOT gates. 

In the proposed construction, each of the control qubits 
regulates an even number of NOT gates, since in a cyclic 
Gray code each bit is flipped an even number of times. 
On the other hand, Eq. (J2J yields 
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(3) 



Hence, for any of the standard basis vectors acting as an 
input all the induced NOT gates annihilate each other 
and negate some of the angles {Oi}. Furthermore, sub- 
sequent rotations about any single axis a are additive, 
i.e., R a {(j))R a {uj) = R a ((f> + oj) for arbitrary angles </> and 
lo. Thus, the construction yields a rotation of the qubit 
m about the axis a through an angle which is a linear 
combination of the angles {Oi}. Consequently, the pro- 
posed quantum circuit is equivalent to the gate F^(i? a ) 
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Figure 2: (a) Quantum circuit realizing the gate F^R^), 
where a is perpendicular to the a;-axis. Here we have used 
a notation R^ = R a (9j). (b) Binary reflected 3-bit Gray 
code used to define the positions of the control nodes. The 
black and white rectangles denote bit values one and zero, 
respectively. 



provided that the angles {8i} are a solution of the linear 
system of equations 



(4) 



where the matrix elements My can be determined using 
Eq. @. The rotation angle Oj is negated, provided that 
the control nodes attached to the I th qubit are active and 
the I bit of gj-x has the value one. The negations must 
be applied for each control qubit independently, which 
results in 



M% = (-!)<> 



(5) 



where hi is the standard binary code representation of the 
integer i and the dot in the exponent denotes the bitwise 
inner product of the binary vectors. 

The matrix M k bears a strong resemblance to the k- 
bit Walsh-Hadamard matrix i2§ = 2-*/ 2 (-l) 6 *- 1 '^- 1 , 
which is by construction orthogonal. Since a Gray code 
is a permutation of the standard binary code, 2~ k / 2 M k 
is a column-permuted version of H k and thus also or- 
thogonal. Consequently, we obtain the inverse matrix 
(Af^)- 1 = 2- k {M k ) T and the determination of {6>J for 
any desired angles {on} is immediate. Thus any uni- 
formly controlled rotation F^(i? a ) with a x — and k > 1 
can be realized using 2 k CNOT gates and 2 k one-qubit 
rotations R a (0i). We note that although we chose to use 
the binary reflected Gray code to determine the positions 
of the control nodes in the CNOT gates, any cyclic fc-bit 
binary Gray code will also qualify. Furthermore, F^(i? a ) 
can also be achieved by a horizontally mirrored version 
of the quantum circuit presented. 

The CSD of a unitary 2 n x 2™ matrix may be expressed 
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where the matrix (P^u 3 -0 is absorbed into the definition 
of = (P^ptc/i., and i* w) is kept intact. 

Finally, the decomposition leads to the result 



where the exact form of the submatrices is given be- 
low. The decomposition may be applied recursively to 
the submatrices of C/j, until a 2 x 2 block-diagonal form 
is encountered. In our indexing scheme, the upper index 
denotes the level of recursion, whereas the lower index 
denotes the position of the matrix within the resulting 
matrix product. We note that CSD is not unique, and 
one should take the possible internal symmetries of the 
matrix U into account to obtain the simplest achievable 
form for the matrices C/j. 

In the decomposition, u l - k (k = 1,...,2 4 ) are unitary 
2 n ~ t x 2™~ J matrices and the real diagonal matrices 
Cj k and s l - k (k = l,...,2 i_1 ) are of the form c l - k = 
diag;(cos#z) and s^ k — diag;(sin^) (1 = 1,..., 2 n ~ l ). For 
a general i = 1, . . . , n 
the forms 



1, the matrices t/j and Aj assume 



C/; = diag fc (u; fe ); (fc = l,...,2 l ), 



(7) 
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where the function 'y(j) + 1 indicates the position of the 
least significant non-zero bit in the n-bit binary presen- 
tation of the number j. The matrices P] are deter- 
mined by the preceding matrix . This fixes the or- 
der in which the recursion must be applied, since the 
absorbed matrices (Pj )^ affect consequent decomposi- 
tions. Thus, the recursion in Eq. Qllj l is first applied to 
the matrix f/j with the largest upper index and, upper 
indices being equal, the smallest lower index subject to 
the stopping criterion i = n — 1. 

We find that each of the matrices A*- in Eq. (|12|) cor- 
responds to a gate F[ L ~ 1 (R y ). Furthermore, the 2x2 
block-diagonal matrices Bj may, with a suitable choice 

of Pj \ be expressed as 



and 



Bj = F:-\R z )F^ 1 (R y )F; { ' j) 1 (R z ), 



(13) 



A) = diagfc 



c jk s k 

jk jk 



(k = l,...,2 i - 1 ), (8) 



where the Eq. Q applies also for Uj. For the i th level of 
the recursion we obtain 



up = u\ r 



A i TP 



(9) 



where the indexing function ((i,j) — 2 n ~ l ~ 1 (2j — 1) has 
been introduced to make the result of the recursion more 
feasible. The matrix A\ is also referred to as A 1 ^ 1 y As 

compared with the original matrix C^j -1 , the above de- 
composition contains 2" _1 additional degrees of freedom. 
To specify them explicitly, we define unitary diagonal ma- 
trices 



Pj = diag fc (p 
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where j, k — 1, . . . , 2 l and the diagonal matrix p l - k — 
diag ; (e mi ), where 1 = 1,..., 2™ . The angles {a{\ may 
be chosen arbitrarily for eachp* fe and, as shown below, we 
can use them to reduce the total number of gates needed 
in the final decomposition. We insert I = P^ j-\(P^u j\) 
into Eq. Q, next to A\,. with which PL. .» commutes, 
and obtain 
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and combined with the subsequent AJ^' into a BA sec- 
tion: 

(BA) j =FZ-\R z )FZ-\R y )F^(R z )F^(R y ). (14) 

The final matrix S 2 n-i, for which we have no extra de- 
grees of freedom left, must be implemented as 

B2»-i =F--\R z )F:-\R y ) 

x F:-\R z )F:zI(R z ) ■ ■ ■ F?(R Z )<P, (15) 

where $ is an elementary phase gate which serves to fix 
the unobservable global phase. To illustrate the method, 
the complete decomposition of a general three-qubit gate 
is shown in Fig. [21 

Each of the BA sections consists of two uniformly con- 
trolled z rotations and two uniformly controlled y rota- 
tions. By mirroring the circuits of the y rotations, we 
may cancel four CNOT gates in each section. Hence the 
cost of each of the 2 n_1 — 1 sections is 2 n+1 elemen- 
tary one-qubit rotations and 2 n+1 -4 CNOTs. The final 
B matrix decomposes into uniformly controlled z and 
y rotations followed by a cascade of uniformly controlled 
z rotations which fixes the phases. This cascade corre- 
sponds to the diagonal quantum computer of Ref. |l4j |. 
Applying the mirroring trick, two more CNOT gates are 
cancelled between the z and y rotations. The cost of the 
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Figure 3: Quantum circuit for a three-qubit gate obtained using the cosine-sine decomposition. The sequences of gates Bj 
correspond to the 2 x 2-block diagonal matrices and the gates Aj to the cosine-sine matrices. The leftmost gate sequence 
corresponds to the diagonal quantum computer of Ref. [Ti^ . 



last B section is 2 I1+1 elementary one-qubit gates and 
2«+i _ 4 CNOTs. Finally, we arrive at the total com- 
plexity of the decomposition: 4™ — 2 ,i+1 CNOT gates 
and 4™ elementary one-qubit gates. 

In conclusion, the proposed decomposition of a general 
multi-qubit gate, based on the CSD and uniformly con- 
trolled rotations, provides a quantum circuit that con- 
tains the minimal number of elementary one-qubit gates 
and on the order of four times the minimal number of 
CNOT gates. Compared with the minimal decomposi- 
tion of a two-qubit gate 0, H H E3 the CSD method 
requires 5 extra CNOT gates. For a three-qubit gate the 
CSD requires 48 CNOT gates and 64 elementary one- 
qubit gates, as opposed to the circuit of 40 CNOTs and 
98 elementary one-qubit gates obtained using the KGD 
in Ref. 0. For four-qubit gates the CSD provides a 
quantum circuit of 256 elementary one-qubit gates and 
224 CNOTs, which is the shortest elementary gate array 
known to implement such a gate. Thus, for a general n- 
qubit gate, where n > 4, the method presented provides 
the most efficient quantum circuit known to implement 
the gate. 

To further improve the implementation of a particular 
quantum gate one may optimize the synthesized quan- 
tum circuit. The possible methods for optimization in- 
clude finding the most efficient CSD factorizations, vary- 
ing the Gray codes, mirroring the gate arrays of the uni- 
formly controlled rotations and possibly combining the 
uniformly controlled y and z rotations into general uni- 
formly controlled gates. Certain quantum gates that are 
likely to be useful in quantum computation comprise in- 
ternal symmetries and can thus be implemented using 
only a polynomial number of elementary gates. For ex- 
ample, 0(n 2 ) gates are needed to implement a quantum 
Fourier transformation (QFT) of n qubits 0. Although 
the method presented apparently requires 0(4 n ) elemen- 
tary gates, it is still possible that using proper optimiza- 
tions the gate array will appreciably simplify and the 
result will resemble that of the polynomial decomposi- 
tions. 



This research is supported by the Academy of Fin- 
land through the project "Quantum Computation" (No. 
206457). MM and JJV thank the Foundation of Technol- 
ogy (Finland), JJV the Nokia Foundation, MM and VB 
the Finnish Cultural Foundation, and MMS the Japan 
Society for the Promotion of Science for financial sup- 
port. S. M. M. Virtanen is acknowledged for stimulating 
discussions. 



[2 
[3 
[4 
[5 

[6 

[-: 

[8 
[9 

[io; 
in 

[12 
[13 
[14 

[is; 

[16 

[1? 
[18 



fo 



.hut.fi 



Electronic address: mpmotton 1 

M. L. Nielsen and I. L. Chuang, Quantum Computation 
and Quantum Information (Cambridge University Press, 
2000). 

P. Jaksch and A. Papageorgiou, Phys. Rev. Lett. 91, 
257902 (2003). 

J. P. Paz and A. Roncaglia, Phys. Rev. A 68, 052316 

(2003) . 

D. S. Abrams and S. Lloyd, Phys. Rev. Lett. 83, 5162 
(1999). 

A. Barenco, C. H. Bennett, R. Cleve, D. P. DiVincenzo, 

N. H. Margolus, P. W. Shor, T. Sleator, J. A. Smolin, 

and H. Weinfurter, Phys. Rev. A 52, 3457 (1995). 

G. H. Golub and C. F. Van Loan, Matrix Computations 

(Johns Hopkins Press, Baltimore, 1996), 3rd ed. 

V. V. Shende, I. L. Markov, and S. S. Bullock, Phys. Rev. 

A 69, 062321 (2004). 

J. J. Vartiainen, M. Mottonen, and M. M. Salomaa, Phys. 
Rev. Lett. 92, 177902 (2004). 

F. Vatan and C. P. Williams, Phys. Rev. A 69, 032315 

(2004) . 

J. Zhang, J. Vala, S. Sastry, and K. B. Whaley, Phys. 
Rev. Lett. 93, 020502 (2004). 

G. Vidal and C. M. Dawson, Phys. Rev. A 69, 010301 
(2004). 

F. Vatan and C. P. Williams (2004), quant-ph/0401178. 
N. Khaneja and S. Glaser, Chem. Phys. 267, 11 (2001). 
S. S. Bullock and I. L. Markov, Quant. Inf. Comput. 4, 
27 (2004). 

C. C. Paige and M. Wei, Linear Algebra and Appl. 208, 
303 (1994). 

R. R. Tucci (2001), 2nd Edition, quant-ph/9902062. 
S. S. Bullock (2004), quant-ph/0403141. 
C. Savage, SIAM Rev. 39, 605 (1997). 



